XploR is an R package designed for robust, allelic imbalance and large-scale copy number analysis from whole exome sequencing (WES) data in clinical genomics. It features advanced noise reduction using a panel of normal samples for both coverage and allelic counts, comprehensive smoothing and segmentation algorithms, and accurate purity and ploidy estimation. XploR supports flexible rerun options based on chromosome region, tumor purity, or diploid coverage, and includes integrated ISCN annotation and visualization. These capabilities make XploR a powerful solution for clinical and research applications in genomic copy number analysis.
Install the latest version from GitHub using devtools:
All files needed for a test run in placed at inst/extdata folder. RunExamplePipeline() will use the files in inst/exdata for a test run. Panel of normal generation is not included in the test run. Details for build a panel of normals please refer to Prepare reference files
RunAIsegmentation(
seg = seg,
cov = cov,
ai = ai,
gender = gender,
out_dir = out_dir,
prefix = prefix,
ai_pon = ai_pon,
aitype = "dragen"
)RunAIsegmentation| Parameter | Type | Description | Example Value |
|---|---|---|---|
seg |
character | Path to the GATK segment file. | "sample.seg" |
cov |
character | Path to the GATK denoised coverage count file. | "sample.counts" |
ai |
character | Path to the BAF file or allelic count file. | "sample.baf" |
ai_pon |
character | Path to PON Rdata. AI panel of normals generated by
PONAIprocess. |
"PON_AI.Rdata" |
gender |
character | Sample gender ("female" or "male"), passed
to ReadAI(). |
"female" |
out_dir |
character | Output directory path. | "results/" |
prefix |
character | Output file prefix. | "Sample1" |
mergeai |
numeric | MAF difference threshold for merging segments under “merge” segmentation mode (default: 0.15). | 0.15 |
mergecov |
numeric | CNV difference threshold for merging segments (default: 0.2). | 0.2 |
snpmin |
numeric | Minimum SNPs for MAF segmentation under “merge” segmentation mode (default: 7). | 7 |
minsnpcov |
numeric | Minimum coverage of SNPs to be included (default: 20). | 20 |
maxgap |
numeric | Maximum gap size inside a bin; if exceeded, start a new bin (default: 1,000,000). | 1000000 |
snpnum |
integer | SNP number in each bin (default: 30). | 30 |
maxbinsize |
numeric | Maximum bin size (default: 5,000,000). | 5000000 |
minbinsize |
numeric | Minimum bin size (default: 500,000). | 500000 |
minsnpcallaicutoff |
numeric | Minimum SNPs for reliable CNLOH/GAINLOH (default: 10). | 10 |
mergecovminsize |
numeric | Minimum size for GATK segment merge (default: 500,000). | 500000 |
segmethod |
character | Segmentation method: "merge" for stepwise merging,
"cbs" for CBS segmentation. |
"cbs" |
cbssmooth |
character | If using CBS, "yes" to apply smoothing before
segmentation, "no" to skip smoothing. |
"yes" |
aitype |
character | Type of allelic imbalance data: "gatk",
"other", or "dragen" (see below for
requirements). |
"dragen" |
Note on aitype column requirements: -
If "gatk" or "other": input must include
columns CONTIG, POSITION,
ALT_COUNT, REF_COUNT,
REF_NUCLEOTIDE, and ALT_NUCLEOTIDE. - If
"dragen": input must include columns contig,
start, stop, refAllele,
allele1, allele2, allele1Count,
allele2Count, allele1AF, and
allele2AF.
RunModelLikelihood(
seg = paste0(out_dir,"/",prefix,"_GATK_AI_segment.tsv"),
out_dir = out_dir,
prefix = prefix,
gender = gender,
modelminprobes = 20,
modelminAIsize = 5000000,
minsf = 0.4,
callcov = 0.3,
thread = 6)RunModelLikelihood| Parameter | Type | Description | Example Value |
|---|---|---|---|
seg |
character | Path to the combined segment file (e.g., output from segmentation step above | "results/Sample1_GATK_AI_segment.tsv" |
out_dir |
character | Output directory for results | "results/" |
prefix |
character | Prefix for output files | "Sample1" |
gender |
character | Sample gender ("male" or "female") |
"female" |
modelminprobes |
integer | Minimum number of probes/SNPs per segment to include in modeling | 20 |
modelminAIsize |
numeric | Minimum segment size (bp) to include in modeling | 5000000 |
minsf |
numeric | Minimum scale factor to consider in model selection | 0.4 |
callcov |
numeric | Subclonal events calling cutoff based on total copy number | 0.3 |
thread |
integer | Number of CPU threads to use for parallel processing | 6 |
callcovcutoff |
numeric | (Optional) Threshold for calling without modeling. | 0.3 |
callaicutoff |
numeric | (Optional) Threshold for calling without modeling. | 0.3 |
minsnpcallaicutoff |
integer | (Optional) Minimum SNPs to call AI segment | 10 |
Notes:
?RunModelLikelihood in R.AnnotateSegments(
input = paste0(out_dir,"/",prefix,"_final_calls.tsv"),
out_dir = out_dir,
prefix = prefix,
cytoband = cytoband,
whitelist_edge = whitelist_edge,
gene = gene)AnnotateSegments| Parameter | Type | Description | Example Value |
|---|---|---|---|
input |
character | Path to XploR CNV calling output. | "results/Sample1_final_calls.tsv" |
out_dir |
character | Output directory for results | "results/" |
prefix |
character | Prefix for output files | "Sample1" |
cytoband |
character | Path to cytoband annotation file (TSV). See Prepare input for detail. | "data/cytoBand.txt" |
whitelist_edge |
character | Path to detectable edge for each chromosomes.See Prepare input for detail. | "data/whitelist.txt" |
gene |
character | Path to gene annotation file. See Prepare input for detail. | "data/gene_anno.txt" |
RunPlotCNV(
seg = paste0(out_dir,"/",prefix,"_CNV_annotation.tsv"),
cr =cr,
ballele = ai,
ai_binsize = 100000,
cov_binsize = 100000,
whitelist = whitelist_bed,
gender = gender,
out_dir = out_dir,
prefix = prefix,
aitype = "dragen"
)RunPlotCNV| Parameter | Type | Description | Example Value |
|---|---|---|---|
seg |
character | Path to final annotated call file. | "results/Sample1_CNV_annotation.tsv" |
cr |
character | Path to the GATK denoised copy ratio file with extension
.denoisedCR.tsv |
"data/sample.denoisedCR.tsv" |
ballele |
character | Path to the B-allele file (from DRAGEN, GATK, or other source). See
aitype for required columns. |
"data/sample.tumor.baf.gz" |
ai_binsize |
numeric | Bin size for AI plot (default: 100,000) | 100000 |
cov_binsize |
numeric | Bin size for coverage plot (default: 100,000) | 100000 |
whitelist |
character | Path to whitelist file for regions to include | "data/whitelist.txt" |
gender |
character | Sample gender ("male" or "female") |
"female" |
out_dir |
character | Output directory for plot | "results/" |
prefix |
character | Sample ID or output prefix | "Sample1" |
aitype |
character | Type of allelic imbalance data: "gatk",
"dragen", or "other". |
"dragen" |
BafQC(
annofile = paste0(out_dir,"/",prefix,"_CNV_annotation.tsv"),
out_dir = out_dir,
prefix = prefix)BafQC| Parameter | Type | Description | Example Value |
|---|---|---|---|
annofile |
character | Path to the CNV annotation file (e.g., *_CNV_annotation.tsv) | "results/Sample1_CNV_annotation.tsv" |
out_dir |
character | Output directory for the QC summary file | "results/" |
prefix |
character | Prefix for the QC output file | "Sample1" |
| aitype parameter value | software | minimum columns | File extention |
|---|---|---|---|
dragen |
Illumina DRAGEN | contig, start, refAllele, allele2, allele1Count,allele2Count | "sample..tumor.ballele.counts.gz" |
gatk |
GATK | CONTIG, POSITION, ALT_COUNT, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE | "sample.allelic_counts" |
other |
Other (e.g. samtools) | CONTIG, POSITION, ALT_COUNT, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE | "" |
A Panel of Normals (PON) is required and should be generated using GATK, DRAGEN, or any other software capable of producing allelic count files.
Note: Male and female PON files need to be generated separately.
These files are generated from the PON HD5 file (from GATK), a cytoband file, and gender information. They are essential for downstream processing and include:
These files are created based on the GATK Panel of Normals.
See the function documentation in R: ?PonProcess or
help("PonProcess", package = "XploR").
Example usage:
PonProcess(
pon_file = pon_hdh5_file,
blacklist_bed = output_blacklist_bed,
whitelist_bed = output_whitelist_bed,
cytoband = cytoband,
detectable_edge = output_detectable_edge,
gender = gender
)The ai_pon_file should be a text file listing the paths to normal allelic count files generated by GATK, DRAGEN, or other software.
You can process these files to generate the PON reference for allelic imbalance using:
PONAIprocess(
ai_pon_file = ai_pon_file,
aitype = "GATK",
minsnpcov = 20,
output = "/Pathtoresults",
prefix = "PONAI",
maxgap = 2000000,
maxbinsize = 5000000,
minbinsize = 500000,
snpnum = 30,
gender = "female"
)PONAIprocess| Parameter | Type | Description | Example Value |
|---|---|---|---|
ai_pon_file |
character | Path to a text file listing PoN AI file paths (one per line) | "pon_ai_file_list.txt" |
aitype |
character | Type of AI input file ("gatk", "dragen",
or "other"), passed to ReadPonAI() |
"gatk" |
minsnpcov |
integer | Minimum SNP coverage to include a site in the AI calculation | 20 |
maxgap |
numeric | Maximum allowed gap between SNPs within a bin (in base pairs) | 1000000 |
maxbinsize |
numeric | Maximum allowed bin size (in base pairs) | 5000000 |
minbinsize |
numeric | Minimum allowed bin size (in base pairs) | 500000 |
snpnum |
integer | Target number of SNPs per bin | 30 |
output |
character | Output directory for the processed PoN AI Rdata file | "results/" |
prefix |
character | Prefix for the output file | "PON" |
Gene annotation can be obtained from various sources (e.g., Ensembl, UCSC, Gencode, RefSeq). An example file is included with the package:
gene <- system.file("extdata", "RefSeqCurated.genePred.gene_region.txt", package = "XploR")
head(read.table(gene, header = TRUE, sep = "\t"))Cytoband annotation files are typically downloaded from UCSC. An example file is included:
cytoband <- system.file("extdata", "hg19_cytoBand.dat", package = "XploR")
head(read.table(cytoband, header = TRUE, sep = "\t"))The BinMaf function implements a flexible binning
strategy for minor allele frequency (MAF) data, supporting both tumor
samples and panels of normal (PoN) samples. The binning can be performed
using either a fixed number of SNPs per bin with additional criteria to
handle genomic gaps and bin size limits. Within each bin, Gaussian
mixture modeling (GMM) is applied to identify clusters in the MAF
distribution.
Key features:
This strategy ensures that bins are of consistent size and SNP content, while avoiding the inclusion of widely separated SNPs in the same bin, and is robust for both tumor and normal samples.
In addition to CBS (Circular Binary Segmentation), our pipeline supports a “merge” mode for segmentation based on minor allele frequency (MAF) values. While CBS is the default and recommended strategy, “merge” mode offers a step-wise, rule-based approach to combine adjacent MAF segments.
Step-wise Merging Strategy:
mergeai parameter (user-specified MAF
difference threshold).Note:
To correct for systemetic MAF bias estimates, we use a panel of normal (PoN) allelic count files as a reference. For each segment in the tumor or sample of interest, we compare the segment’s MAF to the distribution of MAF values observed in the PoN for the same genomic region. This process ensures that technical or locus-specific biases in MAF are removed only when the tumor segment does not show significant deviation from the normal reference.
Correction process: - For each segment, identify all overlapping PoN segments and extract their MAF values. - If the segment’s MAF is not significantly different from the PoN MAF distribution (assessed via a Wilcoxon test or a small absolute difference), apply a logit-based centering correction: - The segment MAF is transformed to the logit scale, centered by the PoN median MAF, and then inverse-logit transformed back and capped at 0.5. - If the segment’s MAF is significantly different from the PoN, the original segment MAF is retained (no correction is applied), thus preserving true biological signal. - This approach ensures that only technical or systematic biases are corrected, while real allelic imbalance events in the tumor are preserved.
This method uses the panel of normals as an adaptive reference, providing robust bias correction without shrinking true tumor signals.
To accurately model over-dispersion in minor allele frequency (MAF) data, we estimate a beta-binomial dispersion parameter (\(\theta\)) using a panel of normal (PoN) samples. This allows us to account for extra-binomial variation and improves the likelihood calculation for each segment.
For each bin \(b\) and depth stratum:
Within each depth stratum, we take a robust center (median) of \(\widehat{\theta}_b\) to obtain \(\theta\) for that stratum.
where:
Priors are assigned to each potential copy number combination based on the principle of parsimony, which favors simpler (biologically less complex) allele configurations. The biological difficulty level reflects the number of steps required to reach a given allele combination from the baseline diploid state (1,1), where each step represents either a gain or loss of one allele.
For each genomic segment, the model computes a range of possible
tumor-specific copy numbers (CN_tumor) that could result
from observed data under different cancer cell fractions (ccf):
\[ CN_{tumor} = \frac{C_i \times 2 / (\mu \times 100) - (1 - \rho) \times 2 - \rho \times (1 - ccf) \times 2}{\rho \times ccf} \]
where:
\(C_i\): Observed segment copy number
\(\mu\): Diploid coverage scale factor
\(\rho\): Tumor purity
\(ccf\): Cancer cell fraction
Combination Generation:
For each potential tumor CN, all feasible major and minor allele
combinations are generated and filtered based on biological
plausibility.
CCF Value Calculation:
For non-diploid segments, CCF is calculated; for diploid segments, it is
set as NA.
\[ \mathrm{Beta}(\alpha, \beta) \]
where:
\(\alpha = K \times \mathrm{BAF} + \epsilon\)
\(\beta = K \times (1 - \mathrm{BAF}) + \epsilon\)
\(K\): Beta-binomial precision parameter, estimated for each segment based on local read depth and the over-dispersion parameter (see below).
\(\epsilon\): Small positive value for numerical stability
Estimation of \(K\):
For each segment, \(K\) is calculated as:
\[ K = \frac{\text{depth}}{1 + (\text{depth} - 1) \cdot \theta} - 1 \]
where:
\(\text{depth}\): Median SNP depth for the segment
\(\theta\): Beta-binomial over-dispersion parameter, estimated from the panel of normals (PoN) for the corresponding depth stratum
Posterior Likelihood:
The posterior likelihood for each combination incorporates both the BAF
likelihood and the prior, weighted by a factor \(\gamma\):
\[ \text{Posterior Likelihood} = \text{MAF Likelihood} \times (\text{Prior})^\gamma \]
where:
The SelectCallpersegment() function refines and selects
the most likely allele combinations for each genomic segment, handling
both clonal and subclonal events, and incorporates coverage differences
and prior knowledge.
cov_diff) from observed coverage.cov_diff.minor = 0:
minor = 0, where MAF likelihood is
unreliable, select the model with the smallest
cov_diff.minor = 0 and
minor ≠ 0 cases.log_MAF_likelihood) for each selected model.sample_GATK_AI_segment.tsv ( Generared by
?RunAIsegmentation function)
| Column | Type | Description | Example_value |
|---|---|---|---|
| Sample | character | Sample identifier | Sample1 |
| Chromosome | character | Chromosome name | 1 |
| Start | integer | Start position (base pair) | 123456 |
| End | integer | End position (base pair) | 234567 |
| Num_Probes | integer | Number of probes/SNPs in the segment | 25 |
| Segment_Mean | numeric | Segment mean (log2 ratio) from CNV analysis | 0.42 |
| gatk_SM_raw | numeric | Raw segment mean from GATK | 0.38 |
| gatk_count | integer | Number of counts in GATK segment | 30 |
| gatk_baselinecov | numeric | The GATK baseline is an intermediate value calculated using gatk_SM_raw and gatk_count. | 100.5 |
| gatk_gender | character | Gender as reported by GATK | female |
| pipeline_gender | character | Gender as used in pipeline | female |
| MAF | numeric | Minor allele frequency for the segment | 0.21 |
| MAF_Probes | integer | Number of probes used to calculate MAF | 18 |
| MAF_gmm_G | integer | Number of GMM clusters in MAF distribution | 2 |
| MAF_gmm_weight | numeric | Mixture weight of the main GMM cluster | 0.85 |
| size | integer | Segment size in base pairs | 111111 |
| BreakpointSource | character | Source of breakpoint (GATK or
Postprocess) |
GATK |
| FILTER | character | Quality tag for the segment (PASS or
FAILED) |
PASS |
sample_likelihood_raw.tsv (Generated by
?RunModelLikelihood() function)
| Column | Type | Description | Example_value |
|---|---|---|---|
| major | integer | Major allele copy number | 2 |
| minor | integer | Minor allele copy number | 1 |
| CN | integer | Total copy number (major + minor) | 3 |
| ccf | numeric | Cancer cell fraction | 0.85 |
| Bio_diff | integer | Biological difficulty score for the allele combination | 3 |
| prior | numeric | Prior probability for the allele combination | 0.12 |
| expected_maf | numeric | Expected minor allele frequency for this configuration | 0.21 |
| maf_ll | numeric | Log-likelihood for the observed MAF under this configuration | -0.56 |
| weighted_prior | numeric | Weighted log-prior (prior × gamma) | -2.13 |
| exp_maf_ll | numeric | Exponentiated MAF log-likelihood | 0.57 |
| exp_prior | numeric | Exponentiated weighted prior | 0.11 |
| MAF_likelihood | numeric | Posterior likelihood for this configuration | 0.065 |
| Segcov | numeric | Pseudo Segment coverage | 280 |
| MAF | numeric | Observed minor allele frequency | 0.19 |
| mu | numeric | Diploid coverage scale factor | 1.0 |
| rho | numeric | Tumor purity (fraction between 0 and 1) | 0.7 |
| index | character | Segment index or identifier | "12" |
| Tag | character | Segment inclusion/exclusion tag for summarizing total likelihood for
a model (e.g., "Include", "Exclude") |
"Include" |
| ccf_MAF | numeric | Cancer cell fraction estimated from MAF and allele configuration only | 0.81 |
sample_top_likelihood_calls.tsv ( Generated by
?SelectCallpersegment() function ) The format is simillar
with sample_likelihood_raw.tsv, with best allelic
combiantion is selected for each segment under each diploid coverage
scale factor and tumor purity configuration.
sample_Models_likelihood.tsv ( Generated by
?SelectFinalModel() function )
| Column | Type | Description | Example_value |
|---|---|---|---|
| mu | numeric | Diploid coverage scale factor (model parameter) | 1.0 |
| rho | numeric | Tumor purity (model parameter, fraction between 0 and 1) | 0.7 |
| total_log_likelihood_before_refine | numeric | Total log-likelihood for the model before refinement | -1234.5 |
| segments_n | integer | Number of segments included in the model | 27 |
| Likelihood_penalty_rows | integer | Number of segments penalized due to failed likelihood calculation | 2 |
| total_log_likelihood_after_refine | numeric | Total log-likelihood for the model after refinement | -1220.2 |
| diploid_n | integer | Number of diploid segments in the model | 15 |
| diploid_distance_to_integer | numeric | Mean distance to integer copy number for diploid segments | 0.04 |
| nondiploid_n | integer | Number of non-diploid segments in the model | 12 |
| nondiploid_distance_to_integer | numeric | Mean distance to integer copy number for non-diploid segments | 0.11 |
| total_distance_to_integer | numeric | Sum of diploid and non-diploid mean distances to integer copy number | 0.15 |
| ploidy | numeric | Mean copy number (ploidy) across all segments | 2.4 |
| Tier1 | character | Model tier label (e.g., "Tier1_Models",
"Final_model_MAF") |
"Tier1_Models" |
| total_likelihood_cluster | integer | Rank based on total likelihood ( lower is better ) | 1 |
| diploid_distance_cluster | integer | Rank based on diploid distance to integer copy number ( lower is better ) | 1 |
| nondiploid_distance_cluster | integer | Rank based on non-diploid distance to integer copy number (lower is better) | 1 |
| total_likelihood_cluster_mean | numeric | Mean total log-likelihood for the level | -1200.0 |
| diploid_distance_cluster_mean | numeric | Mean diploid distance to integer for the level | 0.03 |
| nondiploid_distance_cluster_mean | numeric | Mean non-diploid distance to integer for the level | 0.10 |
sample_final_calls.tsv (Generated by
?RunModelLikelihood() function)
| Column | Type | Description | Example_value |
|---|---|---|---|
| Chromosome | character | Chromosome name | 1 |
| Start | integer | Start position (base pair) | 3301463 |
| End | integer | End position (base pair) | 247784114 |
| size | integer | Segment size (bp) | 244367069 |
| Num_Probes | integer | Number of probes from GATK segment file | 222. |
| Call | character | Copy number call (e.g., REF, GAIN,
LOSS,GAINLOH,CNLOH) |
REF |
| ccf_COV | numeric | Cancer cell fraction estimated from coverage | 1 |
| ccf_MAF | numeric | Cancer cell fraction estimated from MAF | 0 |
| ccf_final | numeric | Final cancer cell fraction after refinement | 1 |
| Segment_Mean | numeric | Final Segment mean (log2 ratio) | 0.057631093 |
| CNF_correct | numeric | Purity corrected copy number estimate from coverage | 2.086898584 |
| major | integer | Major allele copy number | 1 |
| minor | integer | Minor allele copy number | 1 |
| CN | integer | Total copy number (major + minor) | 2 |
| MAF | numeric | Observed minor allele frequency | 0.5 |
| MAF_correct | numeric | Purity corrected minor allele frequency | 0.5 |
| expected_maf | numeric | Expected minor allele frequency for this configuration | 0.5 |
| expected_cov | numeric | Expected pseudo coverage for this segment | 90 |
| MAF_Probes | integer | Number of probes used for MAF calculation | 1110 |
| MAF_gmm_G | integer | Number of GMM clusters in MAF distribution | 5 |
| MAF_gmm_weight | numeric | Mixture weight of the main GMM cluster | 0.667871528 |
| BreakpointSource | character | Source of breakpoint (GATK or
Postprocess) |
GATK |
| FILTER | character | Quality tag for the segment (PASS or
FAILED) |
PASS |
| maf_ll | numeric | Log-likelihood for the observed MAF | 2.625299941 |
| MAF_likelihood | numeric | Posterior likelihood for this configuration | 8.891628731 |
| mu | numeric | Diploid coverage scale factor | 0.9 |
| rho | numeric | Tumor purity (fraction between 0 and 1) | 0.938 |
| index | character | Segment index or identifier | 1 |
| gatk_SM_raw | numeric | Raw segment mean from GATK | -0.094372 |
| gatk_count | integer | Number of counts in GATK segment | 361 |
| gatk_baselinecov | numeric | The GATK baseline is an intermediate value calculated using gatk_SM_raw and gatk_count. | 385.4038109 |
| gatk_gender | character | Gender as reported by GATK | female |
| pipeline_gender | character | Gender as used in pipeline | female |
| CN_mix | character | Indicator for copy number mixture (No or
CN_Mix) |
No |
| Model_source | character | Source of model selection (Coverage,
Coverage + MAF, Diploid ) |
Coverage + MAF |
Likelihood dot plot: The plot displays the likelihood ranking
for all combinations of diploid coverage scale factor and tumor purity.
The vertical dashed line indicates the likelihood cutoff used to define
Tier 1 models.
Model plot: The model
plot displays the likelihood values of different models, which are
calculated based on potential combinations of diploid coverage scale
factor and tumor purity. In the plot, red indicates higher likelihood,
while blue signifies lower likelihood. The light blue dot indicates the
final model selected by XploR.
Tier1 Models Overall: This plot shows
copy number calls for each combination of diploid coverage scale factor
and tumor purity. Red indicates gain, blue indicates loss, and white
indicates no change. Each configuration is labeled on the y-axis. By
evaluating coverage and allelic imbalance patterns in this overview, you
can identify the reasonable range of diploid coverage scale factors and
tumor purity values. This helps guide reruns with optimized parameter
ranges if needed.
Tier1 Models Zoom in: A zoomed-in
view that makes the y-axis configurations more visible for detailed
inspection.
sample_PASS_STAT_chr.txt ( Generated by ?BafQC()
function )
| Column | Type | Description | Example_value |
|---|---|---|---|
| chrom | character | Chromosome name (e.g., 1, 2, …,
X, Y) |
1 |
| FILTER | character | Segment filter status | PASS |
| Total_segment_count | integer | Total number of segments on the chromosome | 25 |
| PASS_Seg_Count | integer | Number of segments with PASS filter status |
20 |
| PASS_Seg_Percent | numeric | Percentage of segments with PASS status (0–1) |
0.80 |
| Total_segment_size | integer | Total size (bp) of all segments on the chromosome | 249250621 |
| PASS_Seg_Size | integer | Total size (bp) of PASS segments on the chromosome |
199400497 |
| PASS_Seg_Size_Percent | numeric | Percentage of total segment size that is PASS
(0–1) |
0.80 |
sample_CNV_annotation.tsv ( Generated by
?AnnotateSegments() function, only unique columns are
listed ).
ISCN calculation rules: 1. All segments will be reported with start and end cytoband in ISCN format. however certain considerations are made for the position of the centromere: a. In metacentric chromosomes, if a segment crosses the centromere and the gaps between the segment and the telomere on both sides are less than 5MB, only the chromosome number will be reported. b. In metacentric chromosomes, if a segment does not cross the centromere, and the gaps between the segment and the centromere and the telomere are both less than 5MB, the chromosome number followed by ‘p’ or ‘q’ will be reported. c. In acrocentric chromosomes, if the segment fulfills rule ‘b’ above, only the chromosome number will be reported.
| Column | Type | Description | Example_value |
|---|---|---|---|
| p_chromStart | integer | Detectable start position of p arm | 10 |
| p_chromEnd | integer | Detectable end position of p arm | 121535434 |
| p_first_name | character | Detectable name of first cytoband in p arm | p36.33 |
| p_last_name | character | Detectable name of last cytoband in p arm | p11.2 |
| q_chromStart | integer | Detectable Start position of q arm | 121535435 |
| q_chromEnd | integer | Detectable end position of q arm | 247784114 |
| q_first_name | character | Detectable name of first cytoband in q arm | q11.1 |
| q_last_name | character | Detectable name of last cytoband in q arm | qter |
| p_gap_to_tel | integer | Gap from segment start to p arm telomere | 0 |
| p_gap_to_cen | integer | Gap from segment end to p arm centromere | 10000 |
| q_gap_to_tel | integer | Gap from segment end to q arm telomere | 0 |
| q_gap_to_cen | integer | Gap from segment start to q arm centromere | 10000 |
| ISCN | character | ISCN-style cytogenetic annotation | 1p36.33-p11.2 |
| Gene | character | Overlapping gene(s) in the segment | TP53 |
| Gene_count | integer | Number of overlapping genes | 1 |
sample_CNV_plot.png ( Generated by ?RunPlotCNV()
function). The
CNV Plot shows a genome-wide summary of the copy number (top track),
B-allele frequency (BAF, second track) data, tumor fraction( ccf, third
tract ) and quality of segment ( bottom track). The Copy Number (CN), on
the Y-axis, is a linear count of the number of copies of each chromosome
in the tumor cells, taking tumor purity and tumor fraction into account.
Each chromosome is plotted as a set of dots that collectively show the
estimated sequence coverage for the chromosome, and as a narrow
turquoise line that shows the final CN call for the chromosome. The BAF
plot shows the variant allele fraction of SNPs across the genome with
the same coloration used in the Copy Number plot. When the copy number
of a chromosome changes, the BAF plot for an affected chromosome splits
due to imbalance in chromosome counts. The variance of B-allele
frequencies is quite high so the splitting of the BAF may be difficult
to discern. To assist with interpreting the BAF plot, a turquoise line
is drawn at the median level to show the imbalance.
For a complete list of all functions and their parameters, please visit the XploR function reference.
Each function page includes detailed parameter descriptions, usage examples, and links to related documentation.